SriShell Primo: A Predictive Sinhala Text Input System
نویسندگان
چکیده
Sinhala, spoken in Sri Lanka as an official language, is one of the less privileged languages; still there are no established text input methods. As with many of the Asian languages, Sinhala also has a large set of characters, forcing us to develop an input method that involves a conversion process from a key sequence to a character/word. This paper proposes a novel word-based predictive text input system named SriShell Primo. This system allows the user to input a Sinhala word with a key sequence that highly matches his/her intuition from its pronunciation. A key to this scenario is a pre-compiled table that lists conceivable roman character sequences utilized by a wide range of users for representing a consonant, a consonant sign, and a vowel. By referring to this table, as the user enters a key, the system generates possible character strings as candidate Sinhala words. Thanks to a TRIE structured word dictionary and a fast search algorithm, the system successively and efficiently narrows down the candidates to possible Sinhala words. The experimental results show that the system greatly improves the userfriendliness compared to former characterbased input systems while maintaining high efficiency.
منابع مشابه
Festival-si: A Sinhala Text-to-Speech System
This paper brings together the development of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting Letter-to-Sound rules in...
متن کاملNLP Applications of Sinhala: TTS & OCR
This paper brings together the practical applications and the evaluation of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and an Optical Character Recognition system for Sinhala.
متن کاملDialogue Act Recognition for Text-based Sinhala
This paper discusses the application of classical machine learning approaches to the task of Dialogue Act Recognition for text-based Sinhala. A study was carried out to identify a dialogue act tag set for Sinhala. A new corpus using Sinhala subtitles for English movies was created and was annotated with the selected dialogue acts. Evaluation of the dialogue act recognition system was performed ...
متن کاملCreation of an IT Enabled Sinhala to Braille Conversion Engine
Different text to Braille converter software for different languages is currently available. But for Sinhala language there is no such converter. Hence the visually impaired people of Srilanka who work with Sinhala language, suffer a lot. They need to operate manually for getting the desired output, which is a time consuming procedure. This software / conversion engine will easily convert the S...
متن کاملCorpus-based Sinhala Lexicon
Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a no...
متن کامل